A Stochastic Morphological Analysis for Japanese employing Character n-Gram and k-NN Method
نویسنده
چکیده
Because Japanese corpora have been developed recently, it has become possible to perform stochastic morphological analysis for Japanese(Nagata, 1994; Takeuchi and Matsumoto, 1995; Mori and Nagao, 1996; Yamamoto et al., 1997). Although the same Hidden Markov Model-based approach as English can be fundamentally applicable with word/part-of-speech n-gram data, some problems peculiar to Japanese make the approach indirect. Before calculating the most likely part-ofspeech(abbreviated to 'pos') sequence, it is required to segment input sentences into morphemes referring to word dictionaries.
منابع مشابه
Japanese Unknown Word Identification by Character-based Chunking
We introduce a character-based chunking for unknown word identification in Japanese text. A major advantage of our method is an ability to detect low frequency unknown words of unrestricted character type patterns. The method is built upon SVM-based chunking, by use of character n-gram and surrounding context of n-best word segmentation candidates from statistical morphological analysis as feat...
متن کاملh . R ep or t T R 99 - 1 75 6 Unsupervised Statistical Segmentation of Japanese Kanji
Word segmentation is an important issue in Japanese language processing because Japanese is written without space delimiters between words. We propose a simple dictionary-less method to segment Japanese kanji sequences into words based solely on character n-gram counts from an unannotated corpus. The performance was often better than that of rule-based morphological analyzers over a variety of ...
متن کاملT R 99 - 1 75 6 Unsupervised Statistical Segmentation of Japanese Kanji Strings
Word segmentation is an important issue in Japanese language processing because Japanese is written without space delimiters between words. We propose a simple dictionary-less method to segment Japanese kanji sequences into words based solely on character n-gram counts from an unannotated corpus. The performance was often better than that of rule-based morphological analyzers over a variety of ...
متن کاملThe Design of a Nearest-Neighbor Classi er and Its Use for Japanese Character Recognition
The nearest neighbor (NN) approach is a powerful nonparametric technique for pattern classi cation tasks. Although the brute-force NN algorithm is simple and has high accuracy, its computation cost is usually very expensive, especially for applications such as Japanese character recognition in which the number of categories is large. Many methods have been proposed to improve the efciency of NN...
متن کاملThe design of a nearest-neighbor classifier and its use for Japanese character recognition
The nearest neighbor (NN) approach is a powerfd nonparametric technique for pattern classification tasks. In this paper, algorithms for prototype reduction, hierarchical prototype organization and fast NN search are described. To remove redundant category prototypes and to avoid redundant comparisons, the algorithms exploit geometrical information of a given prototype set which is represented a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997